Unsupervised Learning: Trade&Ahead

Marks: 60

Context

The stock market has consistently proven to be a good place to invest in and save for the future. There are a lot of compelling reasons to invest in stocks. It can help in fighting inflation, create wealth, and also provides some tax benefits. Good steady returns on investments over a long period of time can also grow a lot more than seems possible. Also, thanks to the power of compound interest, the earlier one starts investing, the larger the corpus one can have for retirement. Overall, investing in stocks can help meet life's financial aspirations.

It is important to maintain a diversified portfolio when investing in stocks in order to maximise earnings under any market condition. Having a diversified portfolio tends to yield higher returns and face lower risk by tempering potential losses when the market is down. It is often easy to get lost in a sea of financial metrics to analyze while determining the worth of a stock, and doing the same for a multitude of stocks to identify the right picks for an individual can be a tedious task. By doing a cluster analysis, one can identify stocks that exhibit similar characteristics and ones which exhibit minimum correlation. This will help investors better analyze stocks across different market segments and help protect against risks that could make the portfolio vulnerable to losses.

Objective

Trade&Ahead is a financial consultancy firm who provide their customers with personalized investment strategies. They have hired you as a Data Scientist and provided you with data comprising stock price and some financial indicators for a few companies listed under the New York Stock Exchange. They have assigned you the tasks of analyzing the data, grouping the stocks based on the attributes provided, and sharing insights about the characteristics of each group.

Data Dictionary

Importing necessary libraries and data

Data Overview

Checking The Shape Of The Dataset

Displaying Few Rows Of The Dataset

Checking The Datatypes of The Column Of The Dataset

Checking For Duplicates And Missing Values

Statistical Summary of The Dataset

Exploratory Data Analysis (EDA)

Questions:

  1. What does the distribution of stock prices look like?
  2. The stocks of which economic sector have seen the maximum price increase on average?
  3. How are the different variables correlated with each other?
  4. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?
  5. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?

Univariate Analysis

Current Price

Price Change

Volatitlity

ROE

Cash Ratio

Net Cash Flow

Net Income

Earnings Per Share

Estimated Shares Outstanding

P/E Ratio

P/B Ratio

GICS Sector

GICS Sub Industry

Bivariate Analysis

Let's check the stocks of which economic sector have seen the maximum price increase on average.

Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. Let's see how the average cash ratio varies across economic sectors.

P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. Let's see how the P/E ratio varies, on average, across economic sectors.

Volatility accounts for the fluctuation in the stock price. A stock with high volatility will witness sharper price changes, making it a riskier investment. Let's see how volatility varies, on average, across economic sectors.

Data Preprocessing

Outlier Check

Scaling

K-means Clustering

Checking Elbow Plot

Let's check the silhouette scores

Creating Final Model

Cluster Profiling

Insights

Cluster 0 - Large Market Capitalization / Dow Jones Industrial Average

Cluster 2 - S&P 500 / Diversification

Cluster 3 - "Ride the Energy Rollercoaster" portfolio / Growth mindset

Cluster 4 - High Earnings for a High Price

Hierarchical Clustering

Computing Cophenetic Correlation

Let's explore different linkage methods with Euclidean distance only.

Let's view the dendrograms for the different linkage methods with Euclidean distance.

Checking Dendograms

Creating Model Using sklearn

Cluster Profiling

In contrasts, the dendrogram for Ward linkage appears to provide better clustering, with 5 appearing to be the appropriate number of clusters

Cluster Profiling

Insights

Cluster 0 - Growth for a Price

Cluster 1 - Short-term Poor, Long-term Rich

Cluster 2- DJIA

Cluster 3 - Diversification

Cluster 4 - Energy-specific portfolio

K-means vs Hierarchical Clustering

You compare several things, like:

Which clustering technique took less time for execution?

Which clustering technique gave you more distinct clusters, or are they the same? How many observations are there in the similar clusters of both algorithms?

How many clusters are obtained as the appropriate number of clusters from both algorithms?

You can also mention any differences or similarities you obtained in the cluster profiles from both the clustering techniques.

Actionable Insights and Recommendations

-